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ABSTRACT 

Over the past few decades, educators in general, and language teachers in specific, were more inclined towards using 
testing fechnigues that resembled real life-language performance. Unlike traditional paper-ond-pencil language tests 
that required test-fakers to attempt tests that were based on artificial and acnfrived language aonfent, performance 
tests are aufhenfia so that the fesf-foker is asked to perform language tasks that he or she will need to perform in real-life 
interaations. A very valuable type of performance test is called portfolio assessment in which a record of students' 
performance across a wide range of language tasks over a logical period of time is kept so that a profile of performance 
can be obtained for the evaluation of achievement. This article defines performance assessment, trace Its origins and 
development, explain how performance tests can be constructed, and describes the nature and advantages of 
portfolios. 
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INTRODUCTION 

Some ten or fifteen years ago, few people questioned the 
widespread use of the standardized aohievement tests. 
After all, standardized aohievement tests take relatively 
little time to administer and are inexpensive. In addition, 
the results are simple to report and understand. Often a 
single soore is reported for each student, and aggregate 
scores are reported for a classroom. Finally, and very 
significantly, standardized achievement tests are 
promoted as "objective" measures of achievement, 
meaning that the results are not affected by the personal 
values or biases of the person who scores the test. 

For the past few years, however, language testing 
scholars have called for dramatic changes in how we 
assess what students know and are able to do. They have 
directed most of their criticism at the widespread use of 
standardized achievement tests. However, many 
teacher-made tests and tests found in textbooks have 
similar weaknesses and limitations. Those who propose 
changes in assessment rest their argument on the 
premise that what we assess, and how we assess it, affects 
both what is taught and the way it is taught. Critics of 
current assessment practices argue that the ultimate 
goal of assessment should be to have students who can 



create, reflect, solve problems, collect and use 
information, and formulate interesting and worthwhile 
questions. They therefore argue that our assessments 
must measure the extent to which students have 
mastered these types of knowledge and skills. They 
propose what is commonly called Performance 
Assessment (PA) or, as Flynn (2008) calls it, Performance- 
Based Assessment (PBA). Performance Assessment may 
also be taken as synonymous to what, in education 
literature, has been called Curriculum-based 
measurement (CBM) (Deno, 2003). 

Performance-based assessment utilizes tasks conducted 
by students that enable them to demonstrate what they 
know about a given topic. The difference between PBA 
and the more traditional methods of testing is thot, in PBA, 
students are given the opportunity to better 
communicate what they have already learnt (Flynn, 
2008). CBM is an approach for assessing the growth of 
students in basic skills that originated uniquely in special 
education. A substantial research literature has 
developed to demonstrate that CBM can be used 
effectively to gather student performance data to 
support such a wide range of educational decisions as 
screening to identify, evaluating prereferral interventions. 
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determining eiigibility for and piacement in remediai and 
speciai education programs, formotiveiy evaiuoting 
instruction, and evaiuoting reintegration ond inciusion of 
students in mainstream programs (Deno, 2003). To 
provide an accurate reading of students' and schoois' 
rates of progress, and to provide cues for instruction, 
assessment at every ievei shouid be connected to expiicit 
iearning goais and standards. (Niemi, Baker, and 
Syivester, 2007). 

The idea behind performance assessment is not to say 
that concepts, facts, definitions, dates, names, and 
iocations have no piace in education. However, as the 
critics of traditionai assessment practices point out, many 
of our assessment practices piace too much emphasis 
on assessing content and give far too iittie attention to the 
skiiis and knowiedge. They aiso argue that we must no 
ionger treat assessment as fundomentaiiy separate from 
instruction, if curricuium, instruction, and assessment are 
integrated, the assessment itseif becomes a vaiuabie 
iearning experience. They conciude that, by requiring 
students to compiete high quaiity performance tasks, we 
have the potentiai to bring about significant and positive 
changes in instruction and iearning. This articie provides a 
useful review of performance assessment in language 
programs. 

1 . Background 

Language testing has always followed linguistic theories 
of the time. Thus, the communicative era in the 1970s 
generated a wave of criticism of the traditional non- 
communicative tests. These tests were seen as being 
limited in their concept and as producing artificial 
language, as opposed to the language normally 
produced by human beings. For example, the kind of 
tests used for testing oral language included mostly 
mechanical repetition of words and sentences and the 
supplying of pattern answers to pattern questions, in 
subsequent years there was a shift in language testing 
towards the development and use of tests that resembled 
features of real language use and that required test takers 
to perform language that was authentic, direct, 
communicative, and performance-based. Such tests, it 
was believed, would reflect better 'real life' language use 



as they would tap a broader construct of 'what it means to 
know a language. A number of terms were used along 
with these types of tests. Clark (1975) referred to 'direct 
tests' in which both the testing format and the procedure 
duplicate, as closely as possible, 'the setting and 
operation of real life' situations in which language 
proficiency is normally demonstrated. Jones (1977) 
proposed performance tests in which test takers provide 
information on functional language ability. Morrow (1 977) 
recommended few tests that would offer test takers the 
opportunity for spontaneous language use in authentic 
settings and activities which the candidate would 
recognize as relevant. Canale and Swain (1980) referred 
to performance-based communicative tests which 
required test takers to perform language while 
considering criteria such as saying the right thing, at the 
right time, to the right person. The Foreign Service Institute 
(FSl) Oral Interview (Ol) test was the most relevant example 
of such a direct, performance-based test (Clark, 1975; 
Jones, 1977), requiring test takers to use language in a 
face-to-face oral interaction. The tester asked questions 
on a variety of topics, and the test taker provided the oral 
language sample which was then evaluated by the tester 
with the aid of a rating scale. 

In this way, 'performance' became one feature among a 
number of others, such as 'direct,' 'functional,' and 
'authentic,' all of which characterized communicative 
tests of that era. The unique aspect of the 'performance' 
feature was that test-takers were expected to replicate, os 
much as possible, the type of language used in non- 
testing situations (Bachman, 1990; Bailey, 1985). Thus, 
performance testing referred to tests where a test taker is 
tested on what s/he can do in the second language in 
situations similar to 'real life.' Jones (1985) specified that 
such tests also required the application of prior learning 
experiences in an actual or simulated setting where either 
the test stimulus, the desired response, or both were 
intended to lend a high degree of realism to the test 
situation. 

The above description characterized features of 
performance tests in the 1970s. in the 1980s, 
performance testing became associated more with 
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specific tasks and contexts of professionai preparation 
and certification, mostiyintheworkpioce (Wesche, 1992). 
in this context, performance testing borrowed from the 
fieid of vocotionoi testing in which o test taker needs to 
carry out reoiistic tasks oppiying ionguoge skiiis in octuoi 
or simuioted settings (Corroii and Hoii, 1 985). The criteria 
used to evoiuote the performance was an approximation 
of the way performance wouid be judged in the specific 
and actuai target circumstances, inciuding adequate 
fuifiliment of tasks. Wesche (1992) notes that these tests 
tap both second ianguage abiiity and the abiiity to fuifili 
noniinguistic requirements of the given tasks. With these 
types of tests, the main psychometric feature is that of 
predictive vaiidity: the tests predict how weii a test taker 
wiii perform under reai conditions in o specific context 
(Jones, 1985). The underiying assumptions with those type 
of performance tests is that noniinguistic factors ore 
present in any ianguage performance; consequentiy, it is 
important to understand their roie and channei their 
infiuence on ianguage performance, 
in this regard, McNamara (1996) has proposed a 
distinction between strong and weak hypotheses on 
performance tests, in the strong sense, knowiedge of the 
second ianguage is a necessary but not a sufficient 
condition for success on the performance-test tasks: 
success is measured in terms of performance on the task, 
and not only in terms of knowiedge of ianguage. in the 
weak sense, knowiedge of the second ianguage is the 
most important, and sometimes the one factor, reievant 
for success on the test. The specific contexts in which 
performance testing is used invoives o ciienteie (students, 
empioyees, etc.) with certain shared second ionguoge 
needs that con be identified and described, and that 
can subsequentiy be transiated into test tasks and overaii 
test design. Performance testing, therefore, is associated 
with a specific context and its strongest requirement wiii 
be a detaiied description of that context and the 
ianguage performances associated with it (Sojavaara, 
1992; Wesche, 1992). 

Jones (1985) distinguished among three types of 
performance tests according to the degrees thot the 
tasks require actuai performances: (a) Direct Assessment, 



(b) Work-Piace Assessment, and (c) Simuiotion. in a 'direct' 
assessment, the examinee is piaced in the actuai target 
context, and the second ianguage performance is 
assessed in response to the naturaiiy evoiving situation, in 
the 'work sampie' t/pe, there is a reoiistic task which is 
generaiiy set in the target context: this type enabies 
controi of the eiicitation task and a comparison of the 
performance of different examinees whiie simuitaneousiy 
retaining contextuai reaiism. The 'simuiotion' type creates 
simuiotion settings and tasks in such a way that they 
represent what are thought to be pertinent aspects of the 
reai-iife context. 'Roie piaying' is frequentiy used as a 
simuiotion technique where both the examiner and the 
examinee piay roies. There have aiso been a number of 
efforts to use devices such os video, audio recorders, and 
teiephones. For aii these types, however, it shouid be 
ciear that it is never possibie to satisfy aii the conditions of 
performance communication and contextuai grounding 
since testing is not reaiiy a normal activity. Recognizing this 
fact, more recent techniques utilize a variety of non- 
testing procedures that reflect the real performance 
context: these include record reviews, portfolios, self 
assessment, participant and non-participant 
observations, and external indicators. 

Wesche (1992) differentiated between performance 
testing in the work-place and in the instructional context. 
In the work-place context, tests are used for job 
certification and for prediction of post-training behavior. 
In the instructional context, tests are used for washback, 
diagnostic feedback, and increasing students' 
motivation. Early introduction of performance tests can 
help communicate to learners the importance of 
language objectives, instructors expectations, and 
criteria for judging performances. Texts and tasks which 
are used in performance testing also make very good 
instructional tasks, and ratings obtained from 
performance tests can be translated to diagnostic 
feedback in the form of profile scores. Thus, performance 
tests can actually be introduced in the pre-instruction 
phase for placement, formative diagnosis, and 
achievement purposes; during the program itself, these 
tests can be used for achievement purposes, for 
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summative testing at the end of a program, and for 
certification purposes. In instructional situations where the 
goals ore based on an analysis of large language needs, 
there is a place in the curriculum for an evaluation system 
which includes performance-type tasks. 

2 . Construction of performance tests 
In constructing a performance test, a need analysis is 
conducted in order to provide a detailed description of 
the specific context and tasks which learners will need to 
perform, the specific conditions under which these tasks 
will be performed, and the criteria against which the 
performance can be judged. Then, the learners' 
performances can be judged over a range of tasks that 
need to be sampled, using a variety of instruments and 
procedures. The needs analysis will specify the context of 
the second language use, the type of interactions 
foreseen, the roles, discourse types, and language 
functions to be performed, and the basis on which 
successful fulfillment of the second language tasks is to 
be judged. It is with respect to these needs that the 
performance test is designed, texts and tasks are 
selected, and evaluation criteria ore determined. These 
are then translated into appropriate test objectives and 
tasks, and later into actual test design and scoring. 
Performance tests are generally assessed with the aid of 
rating scales which describe what a person can do with 
the language in specific situations. 

There are a number of questions that need to be 
addressed in constructing performance tests: How can 
the evaluation criteria reflect the kinds of judgments and 
consequences that the performance would entail? What 
relative weighting should be given to the different criteria? 
How can the scoring information be interpreted and 
presented so as to give maximum information back to the 
test users? There are also questions more generally 
related to the criteria by which the performance should 
be judged: What is the proportion of 'language' vs. 
'domain knowledge' to be assessed? Who should be the 
judge to assess the performance - a native speaker, a 
domain specialist, or a teacher? Although most 
performance tests do use the native speaker as the top 
level of the scale (Emmett, 1 985), this issue has been a 



topic of debate in the language testing literature for many 
years (Alderson, 1980; Bachman, 1990). Hamilton, efal. 
(1993) claim that performance on a test involves factors 
other than straight second language proficiency that 
cause an overlap in the performance of native and non- 
native speakers. Therefore, the reference to native 
speaker performance is unwarranted. 

In the past few years, performance testing has become a 
common form of assessment in the educational research 
context. It is associated with any procedure not 
employing paper-and-pencil multiple choice items, and 
it includes a variety of assessment alternatives such as 
open ended responses, constructed responses, problem 
solving tasks, essays, hands-on science problems, 
computer simulations of real world problems, exhibits, 
and portfolios of students' work. (Linn, Baker, and Dunbar, 
1991) 

In its simplest terms, a performance assessment is one 
which requires students to demonstrate that they have 
mastered specific skills and competencies by performing 
or producing something. Advocates of performance 
assessment call for alternative tests that measure 
students' ability to perform specific tasks. Such tasks might 
include (a) designing and carrying out experiments, (b) 
writing essays, (c) working with other students, (d) writing 
term papers, and so on. 

Advocates of performance assessments maintain thot 
every task must have performance criteria for at least two 
reasons. On the one hand, the criteria define for students 
and others the type of behavior or attributes of a product 
which are expected. On the other hand, a well-defined 
scoring system allows the teacher, the students, and 
others to evaluate a performance or product as 
objectively as possible. If performance criteria are well 
defined, another person acting independently will award 
a student essentially the same score. Furthermore, well- 
written performance criteria will allow the teacher to be 
consistent in scoring over time. If a teacher fails to have a 
clear sense of the full dimensions of performance, 
ranging from poor or unacceptable to exemplary, he or 
she will not be able to teach students to perform at the 
highest levels or help students to evaluote their own 
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performance. As such, performance-based assessmenfs 
require individuals fo apply fheir knowledge and skills in 
contexf, nof merely complefing a fask on cue (Brualdi, 
2001 ). 

In developing performance criferia, one musf bofh 
define fhe affribufe(s) being evaluafed and also develop 
a performance confinuum. For example, one oftribufe in 
fhe evoluofion of wrifing mighf be wrifing mechanics, 
defined os fhe exfenf fo which fhe sfudenf correcfly uses 
proper grammar, puncfuofion, and spelling. As for the 
performance dimension, it can range from high quality 
(well-organized, good transitions with few errors) to low 
quality (so many errors that the paper is difficult to read 
and understand). Testers should keep in mind that the key 
to developing performance criteria is to place oneself in 
the hypothetical situation of having to give feedback to o 
student who has performed poorly on o task. Advocates 
of performance assessment suggest that o teacher 
should be able to tell the student exactly what must be 
done to receive o higher score. If performance criteria 
ore well defined, the student then will understand what he 
or she must do to improve. It is possible, of course, to 
develop performance criteria for almost any of the 
characteristics or attributes of o performance or product. 
However, experts in developing performance criteria 
worn against evaluating those aspects of o performance 
or product which ore easily measured. Ultimately, 
performances and products must be judged on those 
attributes which ore most crucial. 

Developing performance tasks or performance 
assessments seems reasonably straightforward, for the 
process consists of only three steps. The reality, however, is 
that quality performance tasks ore difficult to develop. 
With this caveat in mind, the three steps include: 

1 . Listing the skills and knowledge the teacher wishes to 
hove students learn os o result of completing o task. As 
tasks ore designed, one should begin by identifying the 
types of knowledge and skills students ore expected to 
learn and practice. These should be of high value, worth 
teaching to, and worth learning. In order to be authentic, 
they should be similar to those which ore faced by adults 
in their doily lives and work: 



2. Designing a performance task which requires the 
students to demonstrate these skills and knowledge. The 
performance tasks should motivate students. They should 
also be challenging, yet achievable. That is, they must be 
designed so that students ore able to complete them 
successfully. In addition, one should seek to design tasks 
with sufficient depth and breadth so that valid 
generalizations about overall student competence con 
be mode: 

3. Developing explicit performance criteria which 
measure the extent to which students hove mastered the 
skills and knowledge. It is recommended that there be o 
scoring system for each performance task. The 
performance criteria consist of o set of score points which 
define in explicit terms the range of student performance. 
Well-defined performance criteria will indicate students 
what sorts of processes and products ore required to show 
mastery and also will provide the teacher with on 
"objective" scoring guide for evaluating student work. The 
performance criteria should be based on those attributes 
of o product or performance which ore most critical in 
attaining mastery. It is also recommended that students 
could be provided with examples of high quality work, so 
that they con see what is expected of them . 

3. Portfolios in performance assessment 
Proponents of performance assessment also advocate 
the use of student portfolios. In doing so, they also remind 
us that o portfolio is more than o folder stuffed with student 
papers, video topes, progress reports, or related 
materials. As such, portfolios provide the teacher with o 
source for the summotive evaluation of the students. It 
must be o purposeful collection of student work that tells 
the story of o student's efforts, progress, or achievement in 
o given area over o period of time. If it is to be useful, 
specific design criteria also must be used to create and 
maintain o portfolio system. 

Advocates of portfolios suggest two reasons for their use. 
The first reason reflects dissatisfaction with the kind of 
information typically provided to students, parents, 
teachers, and members of the community about what 
students hove learned or ore able to do. Secondly, it is 
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argued that a well-designed portfolio system, which 
requires students to participate in the selection process 
and to think about their work, con accomplish several 
important purposes. For instance, it con motivate 
students. It con provide explicit examples to parents, 
teachers, and others of what students know and are able 
to do. It allows students to chart their growth over time and 
to self-ossess their progress. It encourages students to 
engage in self-reflection. 

Proponents of portfolios argue that the primory worth of 
portfolios is thot they allow students the opportunity to 
evaluate their work. Further, portfolio assessment offers 
students a way to take charge of their learning. In other 
words, portfolio assessment encourages ownership, 
pride, and high self-esteem. Language teachers and 
testers should keep in mind that several decisions must be 
addressed prior to establishing a portfolio system. There 
must be a physical and a conceptual structure. The 
physical structure refers to the actual arrangement of 
doouments used to demonstrate student progress. The 
conceptual structure refers to the underlying goals for 
student learning. In this connection, numerous questions 
need to be addressed: Who is the intended audienoe for 
the portfolios? Parents?Administrators? or other teachers?. 
What will this audience want to know about student 
learning? Will the selected documents show aspects of 
student growth that test scores don't capture? What kinds 
of evidence will best show student progress toward the 
identified learning goals? Will the portfolio contain best 
work only, a progressive record of student growth, or both? 
If portfolios are to be evaluated, the evaluation standards 
should be established before the portfolio system is 
established. As for the evaluation itself, portfolios can be 
evaluated in terms of standards of excellence or on 
growth demonstrated within an individual portfolio, rather 
than on comparisons made among different students' 
work. The final deoision item has to do with what is done 
with portfolios at the end of the course. They could, of 
course, be turned over to students. Flowever, there are 
advantages to keeping portfolios over a long period of 
time and sharing them with other teaohers. Portfolios give 
the teacher opportunities to promote oontinuity in 



students' education. By passing a portfolio on to other 
teachers, a teacher can share important information with 
the student's next teacher. Portfolios should be kept for 
long periods of time several years, and they should act as 
a type of passport as a student moves from one level of 
instruction to another. 

Conclusion 

Performance assessment, although a somewhat recent 
approach in language testing, is gathering momentum 
and size in much the same way as a snowball would do 
when moving downhill. Nowadays, language educator 
do not question its importance and applicability in 
language programs. Its broad scope allows both 
teachers and students to envisage a clearer picture of 
success and achievement. 

It is quite safe and sound to claim that the logical 
conclusion of using performance assessment in 
language programs is students' self evaluation of their 
own suocess. By providing students with the opportunity of 
performing in a wide range of situations and contexts and 
a wide range of tasks, over a long period of time, students' 
portfolios accumulate which can, then, be used as a 
pedestal upon which students' performance can be 
judged. It is, therefore, recommended that language 
teachers give more credence to performance 
assessment in their profession. 

AsVerhoevenand Nico (2002) rightly noticed, one point of 
caution with the implementation of performance 
assessment in education in general, and in language 
programs in specific, is that curricular innovations that are 
based on Performanoe Assessment might be 
represented in teachers' professional rhetoric, but not in 
teacher-made school examinations. This indicates that 
Performance Assessment may remain at theoretical level 
and may not turn up in classroom practice. It is therefore 
vital that curriculum developers should find ways for 
guaranteeing the practical side of performance 
assessment in curricula. 
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